Treetalk-d: a Machine Learning Approach to Dutch Word Pronunciation

نویسنده

  • Bertjan Busser
چکیده

We present experimental results concerning the application of the IGTree decision-tree learning algorithm to Dutch word pronunciation. We evaluate four diierent Dutch word pronunciation systems conngured to test the utility of modularization of grapheme{to{phoneme transcription (G) and stress prediction (S). Both training and testing data are extracted from the CELEX II lexical database. Experiments yield full word transcription accuracies (stressed and syllabiied phonetic transcription) of roughly 75%, and 97% accuracy on G at the letter level. The best system performs G and S in sequence, using a context of four letters left and right per grapheme{phoneme mapping.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic rule-based generation of word pronunciation networks

In this paper a method for generating word pronunciation networks for speech recognition is proposed. The networks incorporate different acceptable pronunciation variants for each word. These variants are determined by applying pronunciation rules to the standard pronunciation of the words. Instead of a manual search, an automatic learning procedure is used to compose a sensible set of rules. T...

متن کامل

A machine learning approach to Swedish word pronunciation

This study focuses on word pronunciation in Text-to-Speech systems for Swedish. The purpose is to investigate whether machine learning techniques match knowledge-based systems in Swedish word pronunciation. The experiments show a maximum grapheme accuracy of just over 97%, and word accuracies from 67.0% for word pronunciation excluding stress assignment, which compares favourably to existing kn...

متن کامل

Meta-Learning for Phonemic Annotation of Corpora

We apply rule induction, classifier combination and meta-learning (stacked classifiers) to the problem of bootstrapping high accuracy automatic annotation of corpora with pronunciation information. The task we address in this paper consists of generating phonemic representations reflecting the Flemish and Dutch pronunciations of a word on the basis of its orthographic representation (which in t...

متن کامل

Modeling pronunciation variation for a dutch CSR: testing three methods

This paper describes how the performance of a continuous speech recognizer for Dutch has been improved by modeling pronunciation variation. We used three methods to model pronunciation variation. First, within-word variation was dealt with. Phonological rules were applied to the words in the lexicon, thus automatically generating pronunciation variants. Secondly, cross-word pronunciation variat...

متن کامل

Improving the performance of a Dutch CSR by modeling within-word and cross-word pronunciation variation

This article describes how the performance of a Dutch continuous speech recognizer was improved by modeling pronunciation variation. We propose a general procedure for modeling pronunciation variation. In short, it consists of adding pronunciation variants to the lexicon, retraining phone models and using language models to which the pronunciation variants have been added. First, within-word pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998